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Abstract. In this paper we present a selfcontained analysis and description of the novel ab 
initio folding algorithm cross, which generates the minimum free energy (mfe), 3-noncrossing, 
(T-canonical RNA structure. Here an RNA structure is 3-noncrossing if it does not contain more 
than three mutually crossing arcs and cr-canonical, if each of its stacks has size greater or equal 
than a. Our notion of mfe-structure is based on a specific concept of pseudoknots and respective 
loop-based energy parameters. The algorithm decomposes into three parts: the first is the 
inductive construction of motifs and shadows, the second is the generation of the skeleta-trees 
rooted in irreducible shadows and the third is the saturation of skeleta via context dependent 
dynamic programming routines. 



1. Introduction and background 



In this paper we introduce the ab initio folding algorithm cross which folds RNA (ribonucleic acid) 
sequences [49] into pseudoknot structures. We give a selfcontained presentation and analysis of 
cross, whose source code is publicly available at 

www.combinatorics.cii/ cbpc/cross.html 

Supplementary material, such as detailed description of the loop-energies and all implementation 
details can be found at the above web-site. Let us begin by providing some background on RNA 
sequences and structures. An RNA molecule is firstly described by its primary sequence, a linear 
string composed by the four nucleotides A, G, U and C together with the Watson-Crick (A-U, G- 
C) and (U-G) base pairing rules. Secondly, RNA, structurally less constrained than its chemical 
relative DNA, folds into helical structures by pairing the nucleotides and thereby lowering their 
minimum free energy, see FiglT] Accordingly, RNA exhibits a variety of 3-dimcnsional structural 
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(a) (b) 

Figure 1 . The phenylalanine tRNA (re)visited: (a) represents the structure of pheny- 
lalanine tRNA, as folded by ViennaRNA [171 119] . (b) shows the phenylalanine structure 
as folded by cross with minimum stack size 3. Note that cross does not contain any stack 
which size < 3, therefore (b) is different from (a) slightly in 48 to 60. 



configurations, the so called tertiary structures, determining the functionality of the molecule. 
Besides the noncrossing base pairings found in RNA secondary structures there exist further types 
of nucleotide interactions [53] . These bonds are called pseudoknots and occur in functional RNA 
like for instance RNAseP [3^ as well as ribosomal RNA [53] . Indeed, RNA exhibits a diversity of 
biochemical capabilities [2], proved by the discovery of catalytic RNAs, or ribozymes [SHIj in 1981. 
Like proteins, RNA is capable of catalyzing reactions whereas transfer RNA acts as a messenger 
between DNA and protein. 




60 



(a) (b) 

Figure 2. The HDV-pseudoknot structure: (a) displays the structure as folded by 
Rivas and Eddy's algorithm ,40,. (b) shows the structure as folded by cross with minimum 
stack size 3. 
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4 
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growth rate 


2.6180 


4.7913 


6.8541 


8.8875 


k 


6 


7 


8 
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growth rate 


10.9083 


12.9226 


14.9330 


16.9410 



Table 1. The exponential growth rates of fc-noncrossing RNA structures (minimum 
arc- length greater or equal than two). 



In light of these RNA functionahties the question of RNA structure prediction appears to be of 
relevance. The first mfe-folding algorithms for RNA secondary structure are due to [12l [28l [8] 
and the first DP folding routines for secondary structures were given by Waterman et al. [iSl 
[54l [34] . predicting the loop-based mfe-secondary structure |49] in 0(n'^)-time and 0(n^)-space. 
The general problem of RNA structure prediction under the widely used thermodynamic model 
is known to be NP-complete when the structures considered include arbitrary pseudoknots [31]. 
There exist however, polynomial time folding algorithms, capable of the energy based prediction 
of certain pseudoknots: Rivas et.al. [40], Uemura et.al. [50], Akutsu [3] and LvngsofST]. In the 
following we shall use the term pseudoknot synonymous with cross-serial dependencies between 
pairs of nucleotides [45] H]. As for the ab initio folding of pseudoknot RNA, we find the following 
two paradigms: Rivas and Eddy's [4^ gap-matrix variant of Waterman's DP-folding routine for 
secondary structures [IS] [HI [201 [HI [31], maximum weighted matching algorithms [TTJ [T3] and 
the latter taylored for pseudoknot prediction [5l [47] . The former method folds into a somewhat 
"mysterious" class of pseudoknots [41] in polynomial time. Algorithms along these lines have been 
developed by Dirks and Pierce [9] , Reeder and Giegerich [36] and Ren et al. [39 . Additional ideas 
for pseudoknot folding involve the iterated loop matching approach [42 and the sampling of RNA 
structures via the Markov-chain Monte-Carlo method [33] . 

Let us now have a closer look at the DP-paradim by means of analyzing the algorithm of Rivas 
and Eddy [JD] [U [TU] . In the course of our analysis we shall make two key observations: first, DP 
algorithms inevitably produce arbitrarily high crossing numbers, see Tab[T]and second that not all 
3-noncrossing RNA structures can be generated by dynamic programming algorithms-at least not 
with the implemented truncations. The generation of high crossing numbers is insofar problematic 
as it implies a very large output class. Already for fc = 4, i.e. for RNA structures exhibiting three 
mutually crossing arcs, we have an exponential growth rate of 6.8541-a growth rate exceeding that 
of the number of natural sequences. In other words, only for an exponentially small fraction of 
these structures we will find a sequence folding into it. Remarkably, this growth rate appears to 
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Matrices 




(r, s) 


Matrices 




(r,s) 


whx{i,j] r, s) 


unknown 


unknown 


vhx{i,j;r, s) 


paired 


paired 


yhx{i,j;r, s) 


unknown 


paired 


zhx{i,j;r, s) 


paired 


unknown 



Table 2. Table shows the gap-matrix whx, vhx, yhx and zhx. 



grow linearly in k, see TabU] Any type of study, along the lines of [44 l [23 l l43 l [38 l fTS l [T5 l [T6] . 
which is based on such an algorithm, is purely computational and docs not allow to deduce generic 
properties in the sense of [15] . 

Let us define now the non gap-matrices {vx, wx) and the gap-matrices (whx, vhx, zhx and yhx). 
[401 135] The non gap-matrices, vx and wx are two triangular n x n matrices, where vx{i,j) is the 
score of the best folding between position i and j, provided that i,j are paired to each other and 
whereas wx(i, j) is the score of the best folding between the position i and j, regardless of whether 
i,j are paired or not. See Tab[2l The gap-matrices are pairs of matrices, ahx{i, j;r, s), where 




im njim njim njim nj 

Figure 3. Non gap- and gap-matrices. The non gap- matrices wx, vx and gap-matrices 
whx, vhx, yhx and zhx. 

a = w,v,z,y, are the scores of the best folding depending on the relation between the positions 
i,j and the relation between positions r,s, respectively, see FiglJl The key idea in Rivas and 
Eddy's algorithm is to use gap-matrices as a generalization of the non gap-matrices wx and vx. In 
particular, both concepts merge for r = s — I, where we have for any i < r < j 

(1.1) whx(i, j;r,r + I) = wx{i,j) 

(1.2) zhx{i,j;r,r+l) = vx{i,j). 

In Fig[4| we illustrate the recursion for wx and vx in the pseudoknot algorithm truncated at 
0{whx + whx + whx). We can draw the following two conclusions: 

• by design-the inductive formation of gap-matrices generates arbitrarily high numbers of mutually 
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Figure 4. The basic recursions: recursion for vx and wx truncated at 0{whx + whx + 
whx) in Rivas and Eddy's algorithm. 



crossing arcs, see FigEl 

• nonplanar, 3-noncrossing pseudoknots cannot be generated by inductively forming pairs of gap- 
matrices, see FigEl 

In order to avoid any confusion: gap-matrices can and will generate nonplanar arc configurations, 
however, they can only facilitate this via increasing the crossing number, FigjSl Fig[5]makes evident 
that the situation is more complex: nonplanarity is not tied to crossings-there are planar as well 
as nonplanar 3-noncrossing structures. 



2. Specifying an output: fc-NONCROSSiNG, canonical RNA structures 



The previous section showed that, for RNA pseudoknot structures, DP-algorithms fold into an 
uncontrollably large set of structures. This phenomenon is in vast contrast to the situation for 
RNA secondary structures. The standard DP-routine cannot produce any crossings, whence they a 
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Figure 5. No control over crossings: Here we show how to build a 4-noncrossing RNA 
pseudoknot with gap-matrices. Iterating the formation of gap-matrices will produce 
higher and higher crossings. 




123456789 10 




123456789 10 

Figure 6. Two nonplanar, 3-noncrossing RNA structures, which cannot be generated 
by pairs of gap-matrices. 

priori produce secondary structures. We now follow in the footsteps of Waterman by generalizing 
his strategy for the case of secondary structures to pseudoknot structures. Accordingly, the first 
step is to specify a combinatorial output class. To this end we shall provide some basic facts on a 
particular representation of RNA structures. 

A fc-noncrossing diagram is a labeled graph over the vertex set [n] with vertex degrees < 1, 
represented by drawing its vertices 1, . . . ,n in a horizontal line and its arcs (i, j), where i < j, 
in the upper half-plane, containing at most fc — 1 mutually crossing arcs. The vertices and arcs 
correspond to nucleotides and Watson-Crick (A-U, G-C) and (U-G) base pairs, respectively. 
Diagrams have the following three key parameters: the maximum number of mutually crossing 



FOLDING 3-NONCROSSING RNA PSEUDOKNOT STRUCTURES 



7 



arcs, fc — 1, the minimum arc-length, A and minimum stack-length, a ((fc. A, (T)-diagrams). The 
length of an arc is given by j — i and a stack of length a is the sequence of "parallel" arcs of 
the form 

(2.1) ((z, j), (z + 1, J - 1), . . . , (z + (a - 1), J -{a- 1))), 

see FiglT] We call an arc of length A a A-arc. 




123456789 10 




1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 



Figure 7. fc-noncrossing diagrams: we display a 4-noncrossing, arc-length A > 4 and 
o" > 1 (upper) and 3-noncrossing, A > 4 and cr > 2 (lower) diagram. 

We are now in position to specify the output-set. We shall consider RNA pseudoknot structures 
that are 3-noncrossing, a > 3-canonical and have a minimum arc-length A > 4. The 3-noncrossing 
property is mostly for algorithmic convenience and the generalization to higher crossing numbers 
represents not a major obstacle. We consider 3-canonical structures, i.e. those in which each stack 
has length at least three, since we are interested in minimum free energy structures. Finally, the 
minimum arc-length of four is a result of biophysical constraints. Accordingly, we shall identify 
pseudoknot RNA structures with (fc, 4, cr)-diagrams and refer to them simply as (/c, (T)-structures, 
implicitly assuming the minimum arc-length A > 4. In Figl8]we present a particular 3-noncrossing, 
3-canonical RNA structure: the HDV-virus as folded by cross. 

We next present some of the combinatorics of (3, cr)-structures. Let t[^|^ denote the number of 
fc-noncrossing, cr-canonical RNA structures over [n] . The generating function, 

Tit(^)-ET?l(^)^" fc,^>3 

n>0 

of fc-noncrossing, cr-canonical RNA structures has been obtained in [32 . This function is closely 
related to Ffc(z) — fk{2n,0)z'^" , the ordinary generating function of fc-noncrossing match- 
ings. Beyond functional equations implied directly by the reflection-principle [14j . the following 
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1 10 20 30 40 50 60 70 80 87 

(a) 




1 10 20 30 40 50 60 70 80 87 



(b) 

Figure 8. The HDV-virus pseudoknot structures as folded by cross (b). This structure 
differs from the natural structure displyed in (a) [Ij by exactly seven base pairs. 

asymptotic formula has been derived [27j 

(2.2) VfceN, /fc(2n,0)~Cfen-(('=-i)'+('=-i)/2)(2(fc-l))2", Cfc > 0. 
Setting 

Wo{x) = 2 . 2(T ^'^^ Vq{x) — 1 — X + wo{x)x'^ + wo{x)x^ + wq{x)x'^ 

we can now state 

Theorem 2.1. Let fc, u € N, where k, a > 3, x is an indeterminate and pk the dominant, positive 
real singularity ofFk{z). Then T!^l^^^{x), the generating function of {k, a) -structures, is given by 

(2.3) TLfL(.) . 4^F. f . 
Furthermore, the asymptotic formula 

(2.4) tW („)^Cfcn-('^-i)^-('=-i)/2(^-^^ , for fc = 3,4,...,9. 
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cr = 3 


2.0348 


2.2644 


2.4432 


2.5932 


2.7243 


2.8414 


2.9480 


CT = 4 


1.7898 


1.9370 


2.0488 


2.1407 


2.2198 


2.2896 


2.3523 


CT = 5 


1.6465 


1.7532 


1.8330 


1.8979 


1.9532 


2.0016 


2.0449 


cr = 6 


1.5515 


1.6345 


1.6960 


1.7457 


1.7877 


1.8243 


1.8569 


(7=7 


1.4834 


1.5510 


1.6008 


1.6408 


1.6745 


1.7038 


1.7297 


(7 = 8 


1.4319 


1.4888 


1.5305 


1.5639 


1.5919 


1.6162 


1.6376 


(7 = 9 


1.3915 


1.4405 


1.4763 


1.5049 


1.5288 


1.5494 


1.5677 



Table 3. Exponential growth rates of (fc, (T)-structures. 



holds, where 7]. is the minimal positive real solution of the equation "^^^^^^ — = pk ■ 

Theorem 1 implies exact enumeration results as well as an array of exponential growth rates in(iexed 
by k and a. The latter are presented in Tabl3]and are of relevance in the context of the asymptotic 
analysis of the algorithm. In addition, Tabl3]shows that 3-noncrossing, cr-canonical RNA structures 
have remarkably moderate growth rates. (7-canonical structures with higher crossing numbers 
exhibit also moderate growth rates, indicating that generalizations of the current implementation 
of cross from fc = 3toA: = 4or5 are feasible. 

3. Loops, motifs and shadows 

Suppose we are given a (3, (7)-structure, S. Let a be an S'-arc and denote the set of S'-arcs that 
cross P by £/s {(3) . Clearly we have 

(3.1) I3€.s/s{a) ^ «e^s(/?). 

An arc a G ^s(/3) is called a minimal, /3-crossing if there exists no a' G £^s{(3) such that a' -< a. 
Note that a € £/s iP) can be minimal /3-crossing, while P is not minimal a-crossing. We call a pair of 
crossing arcs (a, P) balanced, if a is minimal, /3-crossing and P is minimal ci;-crossing, respectively. 
3-noncrossing diagrams exhibit the following four basic loop- types 3-noncrossing diagrams: 



(1) a hairpin-looj), being a pair 
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Figure 9. The standard loop- types: hairpin- loop (top), interior-loop (middle) and 
multi-loop (bottom). These represent all loop- types that occur in RNA secondary struc- 
tures. 

where is an arc and is an interval, i.e. a sequence of consecutive vertices + 1, . . . , j — 

(2) an interior-looj), being a sequence 

{{h,jl), [il + 1,«2 - 1], («2,J2), [j2 + 1, jl - 1]), 

where (12, ^2) is nested in 

(3) a multi-loop, see FigUl being a sequence 

((zi, ji), [zi -f l,c.i ~ l],S:\,[n + l,u;2-l],S:i,...) 

where 5*^^^ denotes a pseudoknot structure over [[jft,r^] (i.e. nested in and subject to the 

following condition: if all S^'^^ = {tOh, t^), i.e. all substructures are simply arcs, for all h, then h > 2. 
We finally define pseudokont-loops: 

(4) a pseudoknot, see FigfTOl consists of the following data: 
(PI) a set of arcs 

P ^ {{■iiji),{i2j2), - ■ ■ ,{itjt)} , 
where ii — minjis} and jt = max{js}, such that 

(i) the diagram induced by the arc-set P is irreducible, i.e. the line-graph of P is connected and 

(ii) for each {is,js) G P there exists some arc (3 (not necessarily contained in P) such that {is,js) 
is minimal /3-crossing. 
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Figure 10. Pseudoknots: we display a balanced (top) and an unbalanced pseudoknot 
(bottom). The latter contains the stack over (3, 24), which is minimal for the arc (9, 30), 
which is not contained in the pseudoknot. 



(P2) all vertices ii < r < jt, not contained in hairpin-, interior- or multi-loops. 

We call a pseudoknot balanced if its arc-set can be decomposed into pairs of balanced arcs. 



3.1. Motifs and shadows. Let -< denote the partial order over the set of arcs (written as (i, j), i < 
j) of a /c-noncrossing diagram, given by 

(3.2) ^ {i2,j2) i2<ii A ji < 32- 

A fc-noncrossing core is a fc-noncrossing diagram without any two arcs of the form ( i , j ) , 
Any /c-noncrossing RNA structure, S has a unique fc-noncrossing core, c{S) |25j . obtained in two 
steps: first one identifies all arcs contained in stacks, inducing a contracted diagram and secondly 
one relabels the vertices. Note that the core-map does in general not preserve arc-length. 




hlength=4 V- A 

I |lenKth = 2| 

Figure 11. Core-structures: A structure, S, (Ihs) is mapped into its core c{S) 
(rhs). Clearly 5* has arc-length > 4 and as a consequence of the collapse of the stack 
((4, 13), (5, 12), (6, 11)) (the red arcs are being removed) into the arc (2, 5). c[S) contains 
the arc (1,3). This arc becomes, after relabeling, a 2-arc. 
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Definition 1. (Motif) A (fc, (T)-motif, m, is a (A;, cr)-structure over [n], having the following 

properties: 

(Ml) m has a nonnesting core. 

(M2) All m-arcs are contained in stacks of length exactly a > 3 and length A > 4. 
The set of all motifs is denoted by M^(n) and we set /xjj ^(n) = |M^(n)|. 

Property (Ml) is obviously equivalent to: all arcs of the core, c(tn), are ^-maximal. 

Let 5 be a (3, (7)-structure. We call two fc-noncrossing diagrams 61,62 adjacent if and only if ^2 is 
derived by selecting a pair of isolated (5i-vertices, i < j such that {i — + 1) is a (5i-arc. With 
respect to this notion of adjacency the set of fc-noncrossing diagrams over [n] becomes a directed 
graph, which we denote by ^fe(n). 

Definition 2. (Shadow) A shadow of 5 is a ?ffe(n)-vertex connected to S by a ?ffe(n)-path. 

Intuitively speaking, a shadow is derived by extending the stacks of a structure from top to bottom. 
Theorem 3.1. Suppose k,a>2. 

(a) Any k-noncrossing, a-canonical RNA structure corresponds to a unique sequence of shadows. 

(b) Any {3, a) -structure has a unique loop-decomposition. 

Proof. Ad (a). Suppose S is an arbitrary (A, cr)-structure over [n]. Wc prove the theorem by 
induction on the number of 5-arcs. We consider the set of ^-maximal elements, S* = {{i,j) | 
{i,j) is ^-max;imal}. Clearly, S* induces a unique (fc, (T)-motif, mk,a{S), contained in S. Indeed, 
since S is by assumption cr-canonical, each 5*-arc occurs in a stack of size > <t. By definition, any 
S-aic which is contained in a stack containing an (unique) 5*-arc is an arc of an unique shadow, 
'^k,a{S). Removing all arcs contained in mk,a{S) the remaining diagram is still A;-noncrossing and 
cr-canonical. To see this it suffices to observe that any S'-arc not contained in mk,cr{S) is contained 
in a stack of size > a not containing any xnk,a (<S')-arcs. Assertion (a) follows now by induction on 
the number of arcs. 

Ad (b). Let c{S) be the core of S. We shall color the c(S')-arcs, a = {i,j), as follows: 
Case (1): <(s)(a) ^0. 

Since c{S) is a 3-noncrossing diagram, we have for any two {i,j), {i',j') € ^c{S){l3), either {i,j) -< 
{i',j') or j < i'. Therefore for any /3 e s^c{S) (ct) there exists an unique ^-minimal arc a* G ^c{S) iP) 
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that is nested in a. If there exists some /? for which a = holds, i.e. a itself is minimal in 

£4(s)(/3)j then we color a red. In other words, red arcs are minimal with respect to some crossing 
/3. Otherwise, for any f] G ^c(S)(a) there exists some «*(/?) -< a. li a*{f3) is the unique -<-maximal 
substructure nested in a, then we color a green and blue, otherwise. 
Case (2): ^c(s)(«) = i-e. a is noncrossing in c(5). 

If there exists no c(S')-arc a' -< a, then we color a purple, if there exists exactly one maximal 
c(S')-arc a' -< a, we color a green and blue, otherwise. It follows now by induction on the number 
of c(S')-arcs that this procedure generates a well defined arc-coloring. Let i e [n] be a vertex. We 
assign to i either the color of the minimal non-red c(S')-arc (r, s) for which r < i < s holds, or red 
if there exist only red c(5)-arcs, (r, s) with r < i < s and black, otherwise. By construction, this 
induces a vertex-arc coloring with the property of correctly identifying all hairpin- (purple arcs 
and vertices), interior- (green arcs and vertices), multi- (blue arcs and vertices) and pseudoknot 
(red arcs and vertices). □ 




Figure 12. Shadows and loops: we give the sequence of shadows (top) and the loop- 
decomposition (below) illustrating Theorem 13.11 Here I (purple) is a hairpin- loop, II 
(green) represents an interior- loop, III (blue) is a multi-loop and finally IV (red) is a 
(balanced) pseudoknot. 

In Fig[T2]we show how these decompositions work. 
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4. Phase I: motif-generation 

The first step in cross consists in creating some kind of shelling of a 3-noncrossing, canonical 
structure via motifs. One key idea in cross is the identification of motifs as building blocks. The 
key point here is that, despite the fact that motifs exhibit complicated crossings, they can be 
inductively generated. This is remarkable and a result of considering the "dual" of a motif which 
turns out to be a restricted Motzkin-path. The latter is obtained via the bijection of Proposition l4.1l 
between crossing and nesting arcs. 

A Motzkin-path is composed by up-, down- and horizontal-steps. It starts at the origin, stays in 
the upper halfplane and ends on the x-axis. Let Mo^(n) denote the following set of Motzkin-paths: 

(a) the paths have height < a{k — 1) 

(b) all up- and down-steps come only in sequences of length a 

(c) all plateaux at height cr have length > 3. 

Let fik-i,a{n) denote the number of Motzkin-paths of length n that (a') have height < a{k — 2), 
(b') up- and down-steps come only in sequences of length a. We set for arbitrary fc, cr > 2 

Gl^z) = Y.f'lA^)^'" 

n>0 

Gfc_i^<j(z) = /ifc„i^g(n)z" 

n>0 

Gi.(.) - 

Now we are in position to give the main result of this section: 

Proposition 4.1. Suppose k,a >2, then the following assertions hold: 

(a) There exists a bijection 

(4.1) /3:M^(n)-^Mo^(n). 

(b) We have the following recurrence equations 

n-(2(T+3) 

(4-2) l^tain) = ^il^„{n-l)+ ^ fik~i{n - 2a - s)fil^„{s) for n > 2cr 

s=0 
n— 2cj 

(4.3) fJ.k,cr{n) = Hk^in- 1) + ^ ^J.k-lin- 2a - s)fik,a{s) for n > 2a - 1. 

s=0 
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a 


2 


3 


4 


5 


6 


7 




1.7424 


1.5457 


1.4397 


1.3721 


1.3247 


1.2894 


Ccr 


0.1077 


0.0948 


0.0879 


0.0840 


0.0804 


0.0780 



Table 4. The exponential growth rates of /xj „.(n) 



where /i^ ^(n) = 1 for < n < 2a and fik-i,ain) — 1 for < n < 2a — 1. 
(c) We have the following formula for the generating functions 

(4.4) Gljz) ^ 

(4.5) Gk-iAz) 



1 



l-z-z^-Gk^2Az)' 
and, in particular, for k = 3 we have the following asymptotic formula 

1 



(4.6) M3,.(")~c, 
where Ca and C^~^ are given by Tab^ 



Proof. Let m be a (fc, a)-motii. We construct the bijection /3 as follows: reading the vertex labels 
of m in increasing order we map each cr-tuple of origins and termini into a cr-tuple of up-steps 
and down-steps, respectively. Furthermore isolated points are mapped into horizontal-steps. The 
resulting paths are by construction Motzkin-paths of height < a{k — 1). Since motifs have arcs 
of length > 4 the paths have at height a plateaux of length > 3. In addition we have u-tuples of 
up- and down-steps. Therefore (3 is well defined. To see that P is bijective we construct its inverse 
explicitly. Consider an element C G Mo^(7i). We shall pair cr-tuples of up-steps and down-steps 
as follows: starting from left to right we pair the first up-step with the first down-step tuple and 
proceed inductively, see Fig[T3l It is clear from the definition of Motzkin-paths that this pairing 
procedure is well defined. Each such pair 

{{Ui, Mi+l, • . ■ , Mi+cr J {dj, + ■ • • , jj+cr)) 

corresponds uniquely to the sequence of arcs {{i -\- a, j), . . . , {i, j -\- a)) from which we can conclude 
that induces a unique cr-canonical diagram, (5^ over [n\. Furthermore has by construction 
a nonnesting core. A diagram contains a fc-crossing if and only if it contains a sequence of arcs 
(ii,ji), . . . , {ik,ik) such that «i < «2 < • ■ ■ < *fc < ji < ^2 < • ■ ■ < jk- Therefore is fc-noncrossing 
if and only if its underlying path Q has height < ak. We immediately derive I3{5q) — Q, whence [3 is 
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Figure 13. The bijection 13: First we have a map from (a) to (b). Then we pair the 
a-tuples of up-steps and down-steps, see the vertical map from (b) to (c). The so derived 
pairs, see the horizontal map from (c) to (d), allow to reconstruct the original motif. 



a bijection. Using the Motzkin-path interpretation we immediately observe that Mo^(n)-paths can 
be constructed recursively from paths that start with a horizontal-step or an up-step, respectively. 
The recursions eq. (|4.2p and cq. (|4.3p and the generating functions of eq. (|4.4p and eq. (|4.5p are 
straightforwardly derived. As for the particular case G3 „{z), we have 



(4.7) 



1 — z — z 



2a 



- (z2 + Z + 1) 



The unique dominant, real singularities of G\ ^ {z) are simple poles, denoted by Co- • Being a rational 
function, ^(z) admits a partial fraction expansion 



Gl^M = H[z)+Y. 



and eq. (|4.6p follows in view of 

. 1 1, 



(4.8) 



\ (n 



'C-z '1-z/c C\oJ VC 



n+l 



□ 
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5. Phase II: the skeleta-tree 

In this section we enter the second phase of cross. What wiU happen here, is that each irreducible 
shadow, generated during the first phase described in Section 21 gives rise to a tree of skeleta. The 
intuition behind this construction is that each tree- vertex, i.e. each skeleton, represents a maximal 
"non-inductive" arc configuration. This does not mean that a skeleton contains all crossings arcs 
of the final structure, but all further crossings are derived by adding independent substructures. 
In other words: their energy contributions are additive. 

A skeleton, S, is a 3-noncrossing structure whose core has no noncrossing arcs, i.e. for any arc a 
we have ^s'(a) 7^ 0, see FigfMl In addition, in a skeleton over the segment + 1, . . . , j — 1, j}, 
Si J, the positions i and j are paired. Recall that an interval is a sequence of consecutive, unpaired 
bases -\- 1, ■ ■ ■ where i—1 and j + \ are paired. Furthermore, recall that a stack of length a 
(see eq. (|2.ip ) is a sequence of parallel arcs ((i, j), (i + 1, j — 1), . . . , (« + (cr — 1), j — (ct — 1))), which 
we write as {i,j,a). Note that a > ctq, where cfq is the minimum stack length of the structure, 
see FigUll An irreducible shadow over + 1, . . . , j — 1, j} is denoted by ISij . It is a particular 
skeleton, i.e. a skeleton in which there are no nested arcs. 

Remark 1. In our implementation of cross, the number of stacks of an irreducible shadow is an 
input parameter. As default we set its maximum value to be three. 




Figure 14. Irreducible shadows and skeleta: an irreducible shadow (a), containing the 
stack (1,20,3) and (7,30,4). (b) A skeleton drawn with its four induced intervals 
^1, 12, 13, h- 



We are now in position to construct the skeleta-tree. Suppose we are given a 3-noncrossing skeleton, 
S. We label the 5- intervals {/i, ...,/„} from left to right and consider pairs {S, r), where r is an 
integer 1 < r < m — 1. Given a pair (S, r) we construct new pairs (S", r') where r' > r as follows: 
we replace a pair of intervals (/p, Iq), i e Ip,j £ Iq, i > r hy the stack a = a), subject to the 
following conditions 
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• S" is a 3-noncrossing skeleton 

• (z + (7 — 1, j — (T + 1) is a minimal element in (S", ^) 

• r' is the label of the first paired base preceding the interval /j 

• i — 1 and j + 1 are not paired to each other. 



FigdS] displays the two basic scenarios via which stacks are being inserted. We refer to the above 
s s 




(a) (b) 



Figure 15. Stack- insertion: if the origin of the inserted stack {i,j,a) is smaller than 
that of its predecessor (a), then r — r' . Paraphrasing the situation we can express this 
as "left- insertion" freezes the index r. Accordingly, (b) showcases the "right-insertion", 
with its induced shift of the indices r r', both indices are drawn in red. 



procedure as (i, j, cr)-insertion and formally express it via 

(5.1) (5,r)^(,,,-,) (5',/). 

Given a pair (5, r) subsequent insertions induce a directed graph, G(^s,r)i whose vertices are pairs 
(S",r') and whose (directed) arcs are given by 

(5.2) ((5,r),(5',r')), where (5, r) (5', r'). 

Remark 2. Note that the algorithm checks whether cr) can be added, i.e. (1) the bases + 
1, • • • , i + (T — 1, 7 — (7 + 1, • • • , J — 1, j} are indeed unpaired and (2) (i — 1, j + 1) is not a base pair. 
The second property guarantees that the core of the stack a) is an arc in the core of S'. 



We proceed by showing that G(^s,r) is in fact a tree. In other words, the insertion-procedure is an 
unambiguous grammar. 

Proposition 5.1. Let Ti — {S \ 3r; (5, r) G T} and Sq be a S-noncrossing skeleton. 

(a) G^Sa-ra) ^■S t^^^ and for any two different vertices {S'l^r'i) and (5*2, 7*2) G(^s,ro)7 ^6 have 
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(b) For k > 3, the graph morphism n: T — > Ti, given by 7r((5, r)) = S is not bijective. 

Remark 3. For any k > 'S, G(^So,ro) is a tree. However assertion (b) indicates that it is really a tree 
of pairs. That means, stack- insertions will in general generate two different pairs with equal first 
coordinate. 



Proof. We prove assertion (a) by induction on the number of inserted arcs, £. For £ = there 
is nothing to prove. For £ ~ I, the pairs (5, tq) and (5",r') differ by exactly one stack, {i, j,<j), 
whence the assertion. Our objective is now to show that for any two {S[,r[) and (S'2, r'2) obtained 
from the root {S,ro) via £ insertions, S[ ^ S'2 holds. Suppose there exists some {S,f), such that 



(5.3) 



(Si,r[ 




If the inserted stacks coincide, we have {S'l, r'^) = (S'2,r2) and there is nothing to prove. Otherwise, 
we obtain S[ ^ S'2, which implies {S[,r[) 7^ {^211^2)^ whence (a). Suppose next, we have the 
following situation 

{So,ro) 

unique path^^ ^\ unique path 



(5.4) 



(^i,ri) 

insertion 



(S'2, r2) 

inse 



insertion 



where the uniqueness of the paths ending at (Si,ri) and (S2,r2) is guaranteed by the induction 
hypothesis. By assumption we have (Si,ri) ^ (S2,r2) and Si and S'l as well as S'2 and 52 differ 
by exactly one stack. Again by induction hypothesis, we have S'l ^ S'2, whence 



(5.5) {Sun) 



{S[,r[), {S2,r2) ^0=ii„j„a,) {S'^A) and 5i ^ 52- 
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We now prove the inductive step by contradition. Suppose we have S[ = 82, then we can conclude 
that a ^ (3 and there exists some {S, f) such that 



(5.6) 




Indeed, we define S to be the skeleton derived from {So,ro) by inserting all S'^-arcs except of 
a, p. It is clear that the skeleton S exists since its stack-set is a subset of the stack-set of S[. 
By construction, S differs from Si and 52 via the stacks a and /3, respectively. By induction 
hypothesis, there exists a unique path from (S*, tq) to {S,f), which implies the existence of a 
unique r. Furthermore, by induction hypothesis, the paths from (SotTq) to (6*1, ri) and (6*2, 7'2) 
are unique and consequently contain (5, f), whence we have the situation given in eq. (j5.6p . 
As a and /3 are both minimal, without loss of generality we may assume ia < ip- Let us consider the 
insertion-path (S, f) =>/3 (5*1, ri) {S'l, r'l). According to this insertion, we obtain ri < ia and by 
construction [ri + 1, — 1] is an S'l-interval. If ja < i/3, then a does not cross any arcs in 5J , which 
is impossible. If Jq > jf3, we arrive at /3 ^ a, which contradicts minimality of a. Therefore, we have 
i/3 < ja < jfi, i.e. the arcs a and (3 are crossing. Next we consider (S,f) {S2,r2) ('S'2,»'2). 
Accordingly, a must be crossed by some (S*, f)-stack, say 7 = {i^, jy, a-y). We next put 7 into the 
context of the insertion-path (5, f ) (5i,ri) =»q, {S[,r'^) and observe that 7 necessarily crosses 
(3. Indeed, otherwise we have the following three scenarios: > jp, j-^ < ri or i-y < ri, > j^. In 
all three cases 7 cannot cross a since i^, j-y ^ [ri + 1, — 1], see FigHni As a result, 7 necessarily 
crosses both stacks: a and /3, which is a contradition to the fact that 5*^ is a 3-noncrossing skeleton, 
whence S'l 7^ In particular we obtain (5*^ , r'l) ^ (S2, r'2), the insertion path is unique and G^s,ro) 
is a tree. 

In order to prove (b) we provide via Fig|17lan example, where the implication (Si, ri) ^ (6*2, ^2) =^ 



Si ^ 52 does not hold. Note that ¥(5^ ,,0) is still a tree. 



□ 
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(a) (b) (c) 

Figure 16. Illustration of the proof of Proposition 15.11 The three different scenarios 
for a noncrossing 7, representing stacks by isolated arcs, (a) < ri, (b) i-y > and (c) 

i~i < ri, j-, > Jib- 



Next we prove that our unambiguous grammar indeed generates any skeleton, which contains a 
given irreducible shadow. 

Proposition 5.2. Suppose we are given an irreducible shadow Sq = ISi^j. Let T(S'o) = G(5p 0) 
denote ist skeleton-tree and let §(5o) he the set 0/ all skeleta, that contain So- Then we have 

(5.7) T(5o) = §(5o). 

Proof. Let £/s denote the set of 5'-arcs. Obviously, for any vertex {S,r) £ T{Sq), S* is a 3- 
noncrossing skeleton such that £/so ^ -^S, whence T(5'o) C S{So) holds. For an arbitrary 3- 
noncrossing skeleton S, let denote the set of all nested stacks in S. Since each arc is either 
maximal or nested we have £/s — ■s^So^-^s''- Sorting via the linear ordering of their leftmost 
paired base, we obtain the sequence S — {ai,a2,- ■ ■ We choose the first element ak G S 

which is intersecting So (not necessarily ai). Then we have 

(5.8) {So,ro) {Si.ri) 

where, 6*1 S T(S'o). We proceed inductively, setting s/g^ = si/g'^ \ and proceed inductively until 
^g'^ = 0. By construction, each 5*^ is in T(5'o), and Sn — S. Accordingly, we constructed an 
insertion-path in T(S'o) from So to 5", from which S(5o) C T(5o) follows. □ 



6. Phase III: Saturation 

In this section we discuss the third phase of cross. The skeleta-trees constructed in the second 
phase organized the non-inductive substructures of an irreducible shadow derived in phase one. 
The objective of the saturation phase is to inductively "fill" the remaining intervals of a given 
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Figure 17. Illustration of assertion (b) of Proposition 15.11 the case fc > 3. While 
T(5g is still a tree (over pairs), the implication (S\^r\") 7^ (S-i^r-^) Si / S2 does 
not hold in general. 

skeleton with specific substructures. Basically, all routines employed here follow the DP-paradigm. 
However, we store a vector of structures rather than energies and implement context sensitive 
DP-routines. 

Suppose we are given a skeleta-tree T(S'o) with root 5*0. Let the order of 5, io(S\ denote the number 
of -(-maximal S'-arcs, see FigUHl Furthermore, let j- and T,\ j be some subset of structures over 
{i, ?■ + 1, . . . , j — 1, j} and those of order r, respectively. Let j- denote the set of saturated skeleta 





(a) (b) 

Figure 18. Order: In (a) we display a structure of order one. (b) showcases a structure 
of order two. 

over {i, i + 1, . . . , j — l,j} and OSM{i,j) E M,;.j be a mfe-saturated skeleton. Furthermore, let 
OS{i,j) be a mfe-structure, which is a union of disjoint OSM(ii,ji), . . . OSM{ir, jr) and unpaired 
nucleotides. By OSM^''\i,j) and OS'[^l(i , j) we denote the respective OSM and OS structures 
of order x. In order to describe the context-sensitive saturation procedure in cross we denote by 
OSmuiih j), OS'pk(i, j) and OSo{i,j), the mfe-structures nested in a multi-loop, pseudoknot and 
otherwise, respectively. 
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Figure 19. OS vers. OSM: we display a OSM{i,j) (a), and a OS{i,j) structure (b). 
The OS{i,j) structure shown in (b) is evidently an union of of the structures OSM{i, s) 
and OSM(s + and the unpaired nucleotide at position i. 



For a given a skeleton Sij, we specify the mapping Sij i—s- OSM(Si_j) as follows: suppose Si_j has 
ni intervals, /i, . . . ,Ini labelled from left to right. For given interval Ir — [ir,jr] and Sr € ^i^j,. 
we consider the insertion of Sr into Ir, distinguishing the following four cases: 
Case(l). Ir is contained in a hairpin-loop. 

u){sr) = 0. That is we have Sr = 0. The loop generated by the Sr-insertion remains obviously a 
hairpin-loop, i.e. ((«,. - 1,> + 1), [«r, jr]), with energy H{ir - 1, > + !)■ 

uj{sr) = 1. Let {p,q) be the unique, maximal s^-arc. Then Sr-insertion produces the interior-loop 



{{ir - 1, jr + 1), [ir,P - 1], (p, q) , [?+ l,jr]), 



with energy I{ik — 1, Jfe + 1,P, 9)- Note that p = ir implies q ^ jr and Sr G OiS'A/g^'(p, q). 
uj{sr) > 2. In this case inserting Sr into Ir creates a multi-loop in which Sr is nested. Then 
Sfc € OS'J^^j , see Figl^Hl Let e(s) denote the energy of structure s. We select the set of all 
structures Sr such that 

' H{ir~l,jr + l) 

I{ir - l,Jr + l,P,q) + e(O^Af W (p, q)) 

yir <P < q < jr and p ^ ir,^ q ^ jk 

yM + P,+e{OS^^^}{ir,jr)). 

Here, M is the energy penalty for forming a multi-loop and Pi is the energy score of a closing-pair 
in multi-loop. 



e{sr 



mm < 



Case(2). /,. is contained in a pseudoknot loop. 

uj[sr) — 0. That is we have Sr ~ {0} and the unpaired bases in Ir are considered to be contained 
in a pseudoknot. 
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/ [2] , , 

/ 0S(i,+ 1,i,-1) \ 



irir+1 



Jr-1 Jr 



Figure 20. Saturation in hairpin- loops: the interval on the left hand side is filled with 
substructures s,. such that ui{s,.) ~ (left), uj{sr) = 1 (middle) or uj{sr) > 2 (right). 

U!{sr) > 1. In this substructure which is nested in a pseudoknot, see Figl2Tl As a 

result our selection criterion is given by 



{jr - ir + 1) • Qpk 



e(OS'pk(v, jr))- 

V + 1) G N is the number of unpaired bases in 7^, and Qpk is the energy score of the 



where (j; 
unpaired bases in a pseudoknot 






irir+1 



jr-1jr 



Figure 21. Saturation of interval nested in a pseudoknot. 



Case(3). Ir is contained in a multi-loop. In analogy to case (2), we distinguish the following cases: 
u}{sr) = 0. That is we have Sr = {0}- The unpaired bases in Ir are considered to be contained in 
a multi-loop. 

i^{sr) > 1- In this case, Sr is a substructure nested in a multi-loop, see Figl22l Accordingly, we 
select all structures satisfying 

, , . I (jr - «r + 1) • Qmul 

e(Sr) = mm < 

j^e(OS'mul(jr, jr)), 

where Qmui denotes the energy score of the unpaired bases in a multi-loop. 

Case(4) Ir is contained in an interior- loop. By construction, the latter is formed by the pair {Ir, /;), 
where r < I. We then select pairs Sr in S^j,, and si in . Note that only the first coordinate 
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of the pair (Ir^Ii) is considered. 

u}{sr) — and w(si) = 0. Obviously, in this case the loop formed by Ir and // remains an interior- 
loop 

{{ir - + 1), [ir,jr], {jr + !,«;- 1), [hJi]), 

whose energy is given by I{ir — I, ji + 1, jr + 1, — !)• 

'-^{sr) > 1 and Lu{si) = 0. In this case, s; — {0}. Ir and /; create a multi-loop, in which s,. and the 
substructure G^^+i^i,-! are nested. 

oj{sr) = and uj{si) > 1. Completely analogous to the previous case. 

u}{sr) > 1 and oj{si) > 1. In this case, Ir and // create a multi-loop, in which s^, si and Gj^+i^i^-i 
are nested, see Fig|23l 

Accordingly, we select all pairs of structures {sr, si) satisfying 



e{sr) + e{si) 



mm < 



Hir - 1, j; + 1, jr + 1) 

M + 2Pi + e{OSmul{tr,Jr)) + {jl - H + I) ■ Qmul 

M + 2Pi + e{OS^^i{iuji)) + (jfe - jfc + 1) • Qmui 
2Pi + e(OS'n,ui(jr,>)) + e(OS'mui(«;, j;)) 



Accordingly, we inductively saturate all intervals and in case of interior loops interval-pairs and 
thereby derive OSM{Sij). Then we select an energy-minimal OSM{i,j) substructure from the 
set of all OSM{Sij) for any skeleton Sij. 

As for the construction of OS{i, j) via OSM{i' , j'), we consider position i in OS{i, j). If i is paired, 
then i is contained in some OSM{i, s). Then OS{i,j) induces a substructure ^2 over {s-|-l, . . . , j}. 
By construction OS{i,j) = OSM{i, s)LiS2, whence 5*2 = OS{s + l,j) and in particular we have 



(6.1) 



e{OS{t,jj) = e{OSM{i,s)) + e{OS{s + 
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ir i| il ir .ir i| jl irir+1 ir+lj, i) il i, ir Ji+ljl 




1 i|i|+1 + 

Figure 23. Saturation of an interval contained in an interior-loop, which is obtained 
by Ir and where r < I. 

Suppose next i is unpaired in OS{i,j). Since e is a loop-based energy, we can conclude OS{i,j) = 
{0)(jOS{i + 1, j), i.e. we have 

(6.2) e{OS{i,]))^e{OS{i + l,]))+Q 

where Q represents the energy contribution of a single, unpaired nucleotide. Accordingly, we can 
inductively construct OS{i,j) via the criterion 

t{OS{i,3)) = min{e{OS{i + 1, j)) + Q,e{OSM{i,s)) + e{OS{s + 1, j))}, V* < s < j. 




Figure 24. Constructing OS{i,j): inductive decomposition of the optimal structure, 
OS{i,j), into saturated skeleta, OSM{i, s) and unpaired nucleotides. 



Now we can inductively construct the array of structures OS{i,j) and OSM{i, j) via OS and 
OSM structures over smaller intervals. As a result, we finally obtain the structure OS{l,n), 
i.e. the mfe-structure, see Figi25l 



FOLDING 3-NONCROSSING RNA PSEUDOKNOT STRUCTURES 



27 



OSM OS 

step 1 step 2 ■■■ step 1 step 2 ■■■ 




Figure 25. Inductive construction of OS and OSM structures: in the s-th step, we 
first construct OSM{i, i + s), for any 0<i<n — s + 1. We then construct OS{i, i + s) 
recruiting OSM-structures over intervals of lengths strictly smaller than s. 



7. Synopsis 



After providing the necessary background and context on pseudoknot folding routines and k- 
noncrossing structures, we discussed in detail in Sections I4I5I and the three phases of cross, see 
FigESl Now, that the key ideas are presented, we proceed by integrating and discussing our results. 
Cross is an ab initio folding algorithms, which is guaranteed to search all 3-noncrossing, cr-canonical 
structures and derives the corresponding loop-based mfe-configuration. A detailed description of 
the loop-energies as well as specific implementation particulars on how to generate the skeleta-trees 
of Section [5] via a certain matrix construction can be found at 

www.combinatorics.cn/cbpc/cross.html 

We remark that the code is improved and new features are being added, for instance, we currently 
work towards deriving the partition function version of cross, the generalization for arbitrary k 
and a fully parallel implementation. The design of cross is fundamentally different from that of 
the pseudoknot DP-routines found in the literature. Point in case being the algorithm of [ID], 
as outlined in Section [1] We showed that the latter cannot create any nonplanar 3-noncrossing 
structure and furthermore cannot control the maximal number of mutually crossing arcs (crossing 
number). Consequently, DP-routines generate pseudoknot complexity by "just" increasing this 
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I .i%i>>»>»^ ■ J^i^i^>^si^>s>^>si 




— ^><r — saturation — — ^ optimal v-- — px;- — 



III 



optimal 



Figure 26. An outline of cross: the generation of motifs (I), the construction of skeleta- 
trees, rooted in irreducible shadows (II) and the saturation (III), during which, via DP- 
routines, optimal fillings of skeleta-intervals are derived. 



very crossing number. The class of nonplanar 3-noncrossing structures illustrates however, that 
structural complexity is not tantamount to the crossing number. 

One key difference to any other pseudoknot folding algorithm is the fact that cross has a transpar- 
ent, combinatorially specified, output class. This feature exists exclusively in secondary structure 
folding algorithms, where it is by construction implied. This specification is based on a novel com- 
binatorial class, the fc-noncrossing RNA structures and their exact and asymptotic enumeration 
[Ml [32] ■ The concept of /c-noncrossing RNA structures is based on the combinatorial work 
of Chen et al. [3 [7j. The implications of this framework are profound: for k = 3,4, ... ,6 it is 
possible, employing central limit theorems for /c-noncrossing structures [26l [22] to derive a variety 
of generic properties of sequence-structure maps into RNA pseudoknot structures, irrespective of 
energy parameters j37l [21] . 
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Furthermore cross is capable to generate novel classes of pseudoknots. Even in its current imple- 
mentation, i.e. restriced to 3-noncrossing structures it can generate any non-planar configuration. 
As mentioned already, the extension of cross to a version capable of folding any fc-noncrossing 
structure, is work in progress. In this context, assertion (b) of Proposition 15. II shows that novel 
constructions are required for efficient folding. Cross is by design an algorithm of exponential time 
complexity by virtue of its construction of its shadows and skeleta-trees. Only in its saturation 
phase it employs vector versions of DP-routines. Beyond the asymptotic analysis of motifs, given 
in Section m a detailed study of the performance of cross is work in progress. It appears however, 
that the folding times of random sequences are exponentially distributed. In Fig 1 2 71 we display the 




Figure 27. Mean folding times: we display the logarithm of the folding times of 1000 
random sequences as a function of the sequence length. For 3-canonical and 4-canonical 
structures the linear fits are given by 0.2263n — 19.796 (left) and 0.1364n — 13.659 (right), 
respectively. 



logarithm of the mean folding time of 1000 random sequences. These data suggest exponential 
times with the exponential growth rates of « 1.146 and w 1.254, for 3-canonical and 4-canonical 
structures, respectively. In particular, a random sequence of length 100 folded via a single core, 
2.2-GHz CPU exhibits a mean folding time of 279 seconds with standard deviation of 267744 
seconds. 
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